-
Notifications
You must be signed in to change notification settings - Fork 2.2k
rpcserver: resolve root cause of premature wallet rescanning #10280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpcserver: resolve root cause of premature wallet rescanning #10280
Conversation
Summary of ChangesHello @mohamedawnallah, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This PR optimizes neutrino rescans by integrating the wallet's birthday into the rescan logic, significantly speeding up blockchain synchronization for newly created wallets. It modifies Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an optimization for neutrino rescan by using the wallet's birthday. The changes are well-contained and the logic is sound. I've identified a few areas for improvement, mainly related to code formatting and documentation, to align with the repository's style guide.
0f26e5b to
dae9086
Compare
|
Can we get a brief recap in the OP re: what happened, was it a regression, why this fixes it? Another follow up: if the regression was introduced in How can we develop a regression test for this issue? |
|
Hi @shocknet-justin, Would it be possible to try this potential patch if that resolves your issue (#10180)? I can try to write a reproducible test (admittedly not the easiest thing to do) but having +1/-1 that this changeset works/not works can be helpful |
|
Thank you for looking into this one @mohamedawnallah. I believe I'm experiencing the same issue as @shocknet-justin. I tried running your patches, but it seems like there's still weird behavior on first run. The issue can be seen with: Let me know if I can help you or perhaps test something. |
|
I've got some time sensitive materials I need to finish today but will test in the next day or two. It looks like Hampus is still reproducing it however, the --noseedbackup flag I believe should be triggering the wallet create as quickly as my deployment scripts would. |
dae9086 to
16e24f1
Compare
Benchmarking ReportWith the changeset in this PRWe bring LND v0.20.0 latest release candidate to normal initial sync times that was previously taking days to complete under ideal environmental conditions now takes just over a minute. Benchmarks in Raw Formatdev@dev:~/lnd$ sudo ./compare_lnd_versions.sh
==========================================
LND Version Comparison Benchmark
==========================================
Comparing 4 versions:
- v0.18.3-beta
- v0.18.4-beta
- v0.18.5-beta
- v0.20.0-beta.rc1-after
Each version will run in isolated namespaces
for fair comparison.
>>> Running v0.18.3-beta (1/4)...
==========================================
Benchmarking LND v0.18.3-beta
Using temporary storage: /tmp/lnd-bench-v0.18.3-beta-tQ9SWS
Resource Limits:
- CPU: 4 cores
- Memory: 8G
==========================================
Starting LND in isolated namespace with tmpfs...
Started LND (PID: 851327)
Monitoring sync status...
To manually check status, run:
sudo nsenter --target 851327 --pid --mount --uts --ipc \
/home/dev/lnd/lncli-debug-v0.18.3-beta --lnddir=/tmp/lnd_data_ns/lnd getinfo
[Sample 0] Block: unknown | synced: false | Memory: 348.76 MB | CPU: 0%
[Sample 1] Block: unknown | synced: false | Memory: 437.65 MB | CPU: 223.00%
[Sample 2] Block: 150000 | synced: false | Memory: 119.67 MB | CPU: 190.00%
[Sample 3] Block: 243000 | synced: false | Memory: 131.54 MB | CPU: 158.00%
[Sample 4] Block: 337000 | synced: false | Memory: 150.61 MB | CPU: 171.00%
[Sample 5] Block: 412000 | synced: false | Memory: 160.77 MB | CPU: 151.00%
[Sample 6] Block: 464000 | synced: false | Memory: 174.42 MB | CPU: 143.00%
[Sample 7] Block: 536000 | synced: false | Memory: 193.35 MB | CPU: 157.00%
[Sample 8] Block: 635000 | synced: false | Memory: 207.39 MB | CPU: 167.00%
[Sample 9] Block: 700000 | synced: false | Memory: 217.50 MB | CPU: 153.00%
[Sample 10] Block: 764000 | synced: false | Memory: 242.05 MB | CPU: 154.00%
[Sample 11] Block: 822000 | synced: false | Memory: 253.53 MB | CPU: 152.00%
[Sample 12] Block: 914000 | synced: false | Memory: 278.78 MB | CPU: 144.00%
synced_to_chain
✓ synced_to_chain: true (block height: 918649)
Stopping LND...
========================================
BENCHMARK RESULTS - v0.18.3-beta
========================================
Execution Time: 70.434696628 seconds
Peak Memory Usage: 437.65 MB
Average Memory Usage: 231.76 MB
Average CPU Usage: 157.76%
Samples Collected: 14
Results saved to: benchmark_v0.18.3-beta.json
Temporary directory will be cleaned up automatically
Cleaning up temporary directory: /tmp/lnd-bench-v0.18.3-beta-tQ9SWS
Waiting 10 seconds before next benchmark...
>>> Running v0.18.4-beta (2/4)...
==========================================
Benchmarking LND v0.18.4-beta
Using temporary storage: /tmp/lnd-bench-v0.18.4-beta-2Z0bCV
Resource Limits:
- CPU: 4 cores
- Memory: 8G
==========================================
Starting LND in isolated namespace with tmpfs...
Started LND (PID: 852086)
Monitoring sync status...
To manually check status, run:
sudo nsenter --target 852086 --pid --mount --uts --ipc \
/home/dev/lnd/lncli-debug-v0.18.4-beta --lnddir=/tmp/lnd_data_ns/lnd getinfo
[Sample 0] Block: unknown | synced: false | Memory: 363.35 MB | CPU: 0%
[Sample 1] Block: unknown | synced: false | Memory: 454.38 MB | CPU: 240.00%
[Sample 2] Block: 197000 | synced: false | Memory: 118.01 MB | CPU: 183.00%
[Sample 3] Block: 275000 | synced: false | Memory: 132.96 MB | CPU: 158.00%
[Sample 4] Block: 351000 | synced: false | Memory: 150.52 MB | CPU: 153.00%
[Sample 5] Block: 432000 | synced: false | Memory: 163.67 MB | CPU: 158.00%
[Sample 6] Block: 508000 | synced: false | Memory: 178.56 MB | CPU: 164.00%
[Sample 7] Block: 583000 | synced: false | Memory: 190.03 MB | CPU: 156.00%
[Sample 8] Block: 635000 | synced: false | Memory: 209.26 MB | CPU: 148.00%
[Sample 9] Block: 705000 | synced: false | Memory: 226.96 MB | CPU: 151.00%
[Sample 10] Block: 791000 | synced: false | Memory: 248.11 MB | CPU: 161.00%
[Sample 11] Block: 816000 | synced: false | Memory: 255.58 MB | CPU: 132.00%
[Sample 12] Block: 918000 | synced: false | Memory: 251.60 MB | CPU: 132.00%
synced_to_chain
✓ synced_to_chain: true (block height: 918649)
Stopping LND...
========================================
BENCHMARK RESULTS - v0.18.4-beta
========================================
Execution Time: 70.358434022 seconds
Peak Memory Usage: 454.38 MB
Average Memory Usage: 234.59 MB
Average CPU Usage: 157.61%
Samples Collected: 14
Results saved to: benchmark_v0.18.4-beta.json
Temporary directory will be cleaned up automatically
Cleaning up temporary directory: /tmp/lnd-bench-v0.18.4-beta-2Z0bCV
Waiting 10 seconds before next benchmark...
>>> Running v0.18.5-beta (3/4)...
==========================================
Benchmarking LND v0.18.5-beta
Using temporary storage: /tmp/lnd-bench-v0.18.5-beta-XHFt2i
Resource Limits:
- CPU: 4 cores
- Memory: 8G
==========================================
Starting LND in isolated namespace with tmpfs...
Started LND (PID: 852853)
Monitoring sync status...
To manually check status, run:
sudo nsenter --target 852853 --pid --mount --uts --ipc \
/home/dev/lnd/lncli-debug-v0.18.5-beta --lnddir=/tmp/lnd_data_ns/lnd getinfo
[Sample 0] Block: unknown | synced: false | Memory: 352.79 MB | CPU: 0%
[Sample 1] Block: unknown | synced: false | Memory: 475.08 MB | CPU: 244.00%
[Sample 2] Block: 207000 | synced: false | Memory: 116.33 MB | CPU: 180.00%
[Sample 3] Block: 256000 | synced: false | Memory: 138.87 MB | CPU: 144.00%
[Sample 4] Block: 347000 | synced: false | Memory: 156.42 MB | CPU: 163.00%
[Sample 5] Block: 434000 | synced: false | Memory: 166.09 MB | CPU: 163.00%
[Sample 6] Block: 514000 | synced: false | Memory: 179.01 MB | CPU: 161.00%
[Sample 7] Block: 571000 | synced: false | Memory: 190.89 MB | CPU: 150.00%
[Sample 8] Block: 635000 | synced: false | Memory: 208.35 MB | CPU: 143.00%
[Sample 9] Block: 721000 | synced: false | Memory: 221.60 MB | CPU: 163.00%
[Sample 10] Block: 795000 | synced: false | Memory: 237.88 MB | CPU: 162.00%
[Sample 11] Block: 854000 | synced: false | Memory: 262.40 MB | CPU: 156.00%
synced_to_chain
✓ synced_to_chain: true (block height: 918649)
Stopping LND...
========================================
BENCHMARK RESULTS - v0.18.5-beta
========================================
Execution Time: 65.119188935 seconds
Peak Memory Usage: 475.08 MB
Average Memory Usage: 230.24 MB
Average CPU Usage: 162.00%
Samples Collected: 13
Results saved to: benchmark_v0.18.5-beta.json
Temporary directory will be cleaned up automatically
Cleaning up temporary directory: /tmp/lnd-bench-v0.18.5-beta-XHFt2i
Waiting 10 seconds before next benchmark...
>>> Running v0.20.0-beta.rc1-after (4/4)...
==========================================
Benchmarking LND v0.20.0-beta.rc1-after
Using temporary storage: /tmp/lnd-bench-v0.20.0-beta.rc1-after-LeHuC4
Resource Limits:
- CPU: 4 cores
- Memory: 8G
==========================================
Starting LND in isolated namespace with tmpfs...
Started LND (PID: 854003)
Monitoring sync status...
To manually check status, run:
sudo nsenter --target 854003 --pid --mount --uts --ipc \
/home/dev/lnd/lncli-debug-v0.20.0-beta.rc1-after --lnddir=/tmp/lnd_data_ns/lnd getinfo
[Sample 0] Block: unknown | synced: false | Memory: 356.35 MB | CPU: 0%
[Sample 1] Block: unknown | synced: false | Memory: 489.78 MB | CPU: 249.00%
[Sample 2] Block: 207000 | synced: false | Memory: 144.97 MB | CPU: 222.00%
[Sample 3] Block: 278000 | synced: false | Memory: 172.72 MB | CPU: 203.00%
[Sample 4] Block: 355000 | synced: false | Memory: 207.43 MB | CPU: 221.00%
[Sample 5] Block: 436000 | synced: false | Memory: 242.13 MB | CPU: 224.00%
[Sample 6] Block: 510000 | synced: false | Memory: 269.44 MB | CPU: 216.00%
[Sample 7] Block: 577000 | synced: false | Memory: 303.05 MB | CPU: 222.00%
[Sample 8] Block: 657000 | synced: false | Memory: 324.72 MB | CPU: 228.00%
[Sample 9] Block: 721000 | synced: false | Memory: 354.95 MB | CPU: 214.00%
[Sample 10] Block: 771000 | synced: false | Memory: 378.89 MB | CPU: 208.00%
[Sample 11] Block: 795000 | synced: false | Memory: 405.87 MB | CPU: 183.00%
[Sample 12] Block: 872000 | synced: false | Memory: 392.42 MB | CPU: 227.00%
synced_to_chain
✓ synced_to_chain: true (block height: 918649)
Stopping LND...
========================================
BENCHMARK RESULTS - v0.20.0-beta.rc1-after
========================================
Execution Time: 70.416534861 seconds
Peak Memory Usage: 489.78 MB
Average Memory Usage: 308.40 MB
Average CPU Usage: 208.15%
Samples Collected: 14
Results saved to: benchmark_v0.20.0-beta.rc1-after.json
Temporary directory will be cleaned up automatically
Cleaning up temporary directory: /tmp/lnd-bench-v0.20.0-beta.rc1-after-LeHuC4
==========================================
COMPARISON RESULTS
==========================================
Metric | v0.18.3-beta | v0.18.4-beta | v0.18.5-beta | v0.20.0-beta.rc1-after
------------------------- | ----------------- | ----------------- | ----------------- | -----------------
Execution Time (s) | 70.434696628 | 70.358434022 | 65.119188935 | 70.416534861
Peak Memory (MB) | 437.65 | 454.38 | 475.08 | 489.78
Average Memory (MB) | 231.76 | 234.59 | 230.24 | 308.4
Average CPU (%%) | 157.76 | 157.61 | 162 | 208.15
Summary:
--------
✓ Fastest: v0.18.5-beta (65.119188935s)
✓ Lowest Peak Memory: v0.18.3-beta (437.65 MB)
✓ Lowest Average CPU: v0.18.4-beta (157.61%)
Detailed logs available in:
lnd_v0.18.3-beta.log
lnd_v0.18.4-beta.log
lnd_v0.18.5-beta.log
lnd_v0.20.0-beta.rc1-after.logConclusionsExecution time for v0.20.0-beta.rc1 with the changeset in this PR remains competitive at 70.42 seconds, matching the non-regression performance of v0.18.3-beta and v0.18.4-beta (70.43s and 70.36s respectively) while being only 8.1% slower than v0.18.5-beta (65.12s). Most importantly, this represents a dramatic recovery from the multi-day sync times that plagued v0.19.0-beta, v0.19.1-beta, v0.19.2-beta, v0.19.3-beta, and v0.20.0 current release candidates - what previously took days now completes in just over a minute, making LND v0.20.0 fully operational again. Constraints
Reproducibility
Resources
|
|
cc @hsjoberg, @shocknet-justin Pushed a new changeset and it will solve the wallet sync issues you experienced and this one as well #9081 |
|
@mohamedawnallah I think this partially solved the issue, in that lnd reports syncd_to_chain=true after headers complete, which should unblock our deployments, but what I understand to be the rescan is still taking place after headers sync:
|
It took me two hours to identify the underlying root cause. The changeset I submitted was addressing the symptom rather than the core issue. I'll submit a new changeset tomorrow that fully resolves the root cause. Thanks for being responsive during the resolution process! |
rpcserver.go
Outdated
| // Sync state transitions as dispatcher catches up: | ||
| // Gap >10 blocks: synced (ignore dispatcher lag during catchup) | ||
| // Gap 1-10 blocks: not synced (strict: must be at exact tip) | ||
| // Gap 0 blocks: synced (dispatcher caught up) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blockbeatDispatcher is receiving blocks and dispatching them to blockbeat subscribers, I don't think it's safe to assume we've synced. It's likely that one of the subsystems is still busy processing the block, causing the blockbeat to stall. We should instead identify what's causing the stall fix properly fix it, other than giving user the false impression that lnd is synced.
|
Okay today's experiments haven't come to a meaningful conclusion fix. I am 100% confident that is the main changeset that introduced this regression is this one: 16a8b623b. I guess it is more of race condition/not up-to-date height value after the chain notifier moved from its place to there. Building ANY LND before that commit will NOT TRIGGER any premature wallet rescanning. Will try tomorrow with different experiments :) |
16e24f1 to
a93c2d3
Compare
|
Alrighty, I have submitted a new changeset and It will solve the premature rescanning issue. Something you will notice for now when trying this is that the LND RPC server will not start till the headers will be fully synced to chain that means issuing any RPC calls/commands against LND won't be responsive while headers syncing. That can be noticed on first run, on subsequent runs is negligble unless there is a big timing difference between those runs. I will put later today a root cause analysis report that will help answering the following genuine questions:
For now you can experiment with this changeset if that solves your wallet sync issues |
|
Our workaround is already to no not create until its done, but that's not ideal because its slowing the on-boarding flow relative to prior versions, though its not the end of the world either since its usually only by a minute or two The lack of rpc at all is a bigger problem since we have to parse logs to know how things are progressing, otherwise we're polling into the void... and both options can be flakey An rpc for the state that is available immediately would solve the latter problem |
|
I understand this is not the ideal solution. We are iterating on what the ideal solution could look like. Our process has been:
Why does the current changeset move headers syncing before server start?Let me explain by comparing the old behavior (non-regression) with the new behavior. Old Behavior (Non-Regression)We can use commit 1dfb5a0 as a representative example of the LND version where premature rescanning was not happening. In that version, the sequence was:
References:
New Behavior (Regression)With commit 16a8b62, the order of operations is FLIPPED:
This means the height that Chain Notifier accesses comes from References:
Why Not Return to the Old Order?A natural question: Why not sync first, then call Chain Notifier rescanning in a separate goroutine so it has access to recent height (i.e., move it out of That's a good question. I don't really know the answer, and I may need @yyforyongyu's input on this. If it turns out that Reference:
|
The wallet syncing is started before the RPC server starts. By searching Line 628 in 6ade31d
and here we just check whether the wallet is sycned or not, Lines 706 to 713 in 6ade31d
When the node starts, it will first get the current block height, and pass it to other subsystems, Line 747 in 6ade31d
Then other subsystems can get notified about spending of txns. There is a big improvement we can make, which is layed out here. The idea is that, the two rescan processes - one from the wallet and the other from the notifier can be combined into one, such that we only need to sync blocks once. I think once that's done the startup will be way faster. |
What I meant here by syncing here is full syncing. If running LND in debug mode, someone will see that actually when we reach here we are at ~100,000 headers. The following are the code paths where actually full syncing happens: Lines 694 to 751 in 1dfb5a0
I am sure and confident as currently in the codebase this
It looks like the rescan process holding a state and running them in parallel then converging them into one may not be safe and can potentially introduce a race condition @yyforyongyu – I would love to hear you thoughts what is the best take this PR can safely take given my responses above and time constraint for LND release? |
|
Thank you for the analysis of the root cause behind this issue, @mohamedawnallah. Regarding starting the main RPC server after syncing is done (a93c2d3); I just tested this with Blixt Wallet and (as expected) the RPC is not reachable before the chain is synced. This kinda breaks Blixt as we let the user play around with the app while it is syncing -- for example the user can send onchain so that they can be onboarded quicker. We also use We would really like to be able to use the RPC server before the chain has been synced. The current fix can't really work for us. Is there any other way we can solve this one? |
a93c2d3 to
2fc8729
Compare
| // TODO(yy): break the server startup into four steps, | ||
| // 1. init the low-level services. | ||
| // 2. start the low-level services. | ||
| // 3. init the high-level services. | ||
| // 4. start the high-level services. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be a planned future improvement. If these code changes are approved and merged as-is, it would be beneficial creating a tracking issue for this TODO(s) probably by the assignee to ensure we don't lose track of it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, why did we ever start the chain watcher in the newServer function, which should just do init.
|
I've heard all of your requests @hsjoberg, @shocknet-justin, @yyforyongyu regarding the importance of not blocking the LND RPC server while fully syncing headers. Could you please help evaluate the new updated changeset to see if it's working as expected? I've tested it and it seems to be working, but I may have done that evaluation in a tight loop |
2fc8729 to
491c3d2
Compare
|
The last force push after inviting for changes evaluation is just related to git commit msg title |
Hi @mohamedawnallah. I tried the new solution and it's working well for me inside Blixt. Everything is working as expected! |
|
To add a bit more context, the When started, it'll create a new rescan using the backend: lnd/chainntnfs/neutrinonotify/neutrino.go Lines 178 to 239 in 9f61140
The start height of this rescan is based on the current best height of the p2p node: lnd/chainntnfs/neutrinonotify/neutrino.go Lines 178 to 186 in 9f61140
If we start this right in The correct ordering is that we only call into |
491c3d2 to
41ba412
Compare
…re rescan This commit addresses a regression where Neutrino rescanning starts from an outdated height (~100k blocks behind) instead of using the current synced height. Root Cause: In commit 16a8b62, the initialization order was changed so that Chain Notifier starts before wallet syncing completes. This means the rescan begins using the stale height from BuildChainControl rather than the fully synced height. Old behavior (commit 1dfb5a0): 1. RPC server starts 2. Headers sync as part of daemon server 3. Chain Notifier starts after sync completes 4. Rescan begins from current (synced) height Current behavior (regression): 1. Chain Notifier starts in newServer (before RPC) 2. Wallet sync happens after RPC server starts 3. Rescan uses outdated height from BuildChainControl Solution: - Ensure headers are fully synced before starting the chain notifier, and after starting the RPC server. - Move chain notifier startup to its correct location after headers are fully synced. - Make sure the starting beat is lazily called after chain notifier started and before that starting beat result is used.
41ba412 to
f761dc4
Compare
|
This test failure is unrelated to the changeset this PR introduces. It is unconcerning as well; It seems a transient error: |
|
Gave this a try to and it seems to work, thank you. |
Thanks @shocknet-justin for confirming the solution is working! Your responsive feedback, along with input from @hsjoberg, was important in resolving this long-standing issue 🙇♂️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🤴🏾
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM walked through the problem with an LLM to understand it better:
⏺ Summary: Neutrino Rescan Optimization Fix
The Real Problem
When LND starts with Neutrino backend, there are two independent rescanning processes that serve different purposes:
1. BTCWallet Rescan (Wallet Transaction History)
- Purpose: Rebuild wallet's transaction history and UTXO set
- Scans from: Wallet birthday → current chain tip
- When: During BuildChainControl → Wallet.Startup() → SynchronizeRPC()
- Status: Runs in background during the IsSynced() wait loop
2. ChainNotifier Rescan (Lightning Event Monitoring)
- Purpose: Provide real-time notifications for Lightning operations (channel confirmations, HTLC tracking, spend notifications)
- Scans from: Starting height → forward
- When: At ChainNotifier.Start()
- Status: Processes every block it encounters
The Regression
In commit 16a8b623b, the ChainNotifier startup was moved to happen inside newServer(), which occurs before the header sync completes.
Broken Flow (OLD CODE):
1. BuildChainControl
└─ Neutrino starts, DB at height: 750,000
└─ BTCWallet begins background rescan from birthday
2. newServer()
└─ ChainNotifier.Start() ← TOO EARLY!
└─ Reads BestBlock() → 750,500 (mid-sync!)
└─ Begins processing blocks from 750,500
3. RPC Server starts
4. IsSynced() wait loop (750,500 → 850,000)
├─ Neutrino downloads headers & filters
├─ BTCWallet rescans wallet history
└─ ChainNotifier processes EVERY intermediate block:
750,501... 750,600... 751,000... 800,000... 849,999... 850,000
(100,000 blocks processed unnecessarily!)
5. "Chain backend is fully synced!" at 850,000
The Waste:
For every intermediate block during sync (750,500 → 850,000):
- ✅ Compact filter downloaded by Neutrino (~25 KB)
- ✅ ChainNotifier receives OnFilteredBlockConnected notification
- ✅ ChainNotifier processes the notification:
- Function call overhead (onFilteredBlockConnected() → connectFilteredBlock())
- Updates internal state (bestBlock height/hash tracking)
- Checks watch registry for pending notifications
- Dispatches to block epoch subscribers
- Updates transaction notifier state
- ❌ Nothing relevant found (Lightning channels operate at current tip!)
- ❌ CPU cycles wasted processing 100,000+ irrelevant block notifications
- ❌ Potential false positives in compact filters could trigger unnecessary full block downloads
Impact:
- Wasted processing: ~100,000 block notifications (function calls + state updates)
- Bandwidth overhead: ~2.5 GB of compact filters downloaded by Neutrino (unavoidable)
- Extra bandwidth risk: Full blocks downloaded only if filters match watched items (rare but possible)
- Delayed startup: 10-60 minutes of unnecessary notification processing
- Poor UX: Wallet appears "stuck" syncing
Important Note on Bandwidth:
ChainNotifier does NOT download all full blocks! The overhead is primarily CPU-based:
- Neutrino downloads headers (8 MB) + compact filters (~2.5 GB) for all blocks regardless
- ChainNotifier processes the notifications for each block (lightweight metadata)
- Full blocks (1-4 MB each) are only downloaded when compact filters match watched addresses/UTXOs
- For a new wallet or node with no active channels, filter matches are rare/non-existent
- The real waste is 100,000+ function calls and state updates, not bandwidth
The Fix
Move ChainNotifier.Start() to occur after the IsSynced() wait completes, ensuring headers are fully synced before ChainNotifier begins monitoring.
Fixed Flow (NEW CODE):
1. BuildChainControl
└─ Neutrino starts, DB at height: 750,000
└─ BTCWallet begins background rescan from birthday
2. newServer()
└─ ChainNotifier NOT started yet ✅
3. RPC Server starts
4. IsSynced() wait loop (750,000 → 850,000)
├─ Neutrino downloads headers & filters
├─ BTCWallet rescans wallet history
└─ ChainNotifier NOT listening yet ✅
(No intermediate blocks processed!)
5. "Chain backend is fully synced!" at 850,000
6. server.Start()
└─ ChainNotifier.Start() ← CORRECT TIMING! ✅
└─ Reads BestBlock() → 850,000 (fully synced!)
└─ Monitors only NEW blocks from 850,000 onwards
Benefits:
- ✅ Zero unnecessary block processing during sync
- ✅ Faster startup: Avoids 10-60 minutes of wasted notification processing per 100k blocks
- ✅ Reduced CPU usage: No function calls or state updates for intermediate blocks
- ✅ Same bandwidth: Neutrino still downloads filters (required), but ChainNotifier doesn't process them
- ✅ Better UX: Wallet reaches "ready" state faster
- ✅ Wallet history still complete: BTCWallet rescan happens independently
Key Insight
The fix recognizes that:
- BTCWallet needs historical data (wallet transactions from birthday)
- ChainNotifier only needs current data (Lightning operations at chain tip)
By delaying ChainNotifier startup until after sync completes, we avoid processing 100,000+ irrelevant block notifications while still maintaining complete wallet transaction history
through BTCWallet's independent rescan process.
The optimization is primarily CPU-focused, not bandwidth-focused. Neutrino downloads headers and compact filters regardless of when ChainNotifier starts. The savings come from
eliminating 100,000+ unnecessary function calls, state updates, and notification dispatches that ChainNotifier would perform for blocks that are irrelevant to Lightning operations.
Code Changes
Removed: startLowLevelServices() call in newServer() that started ChainNotifier early
Added: ChainNotifier.Start() moved to server.Start() after the sync wait completes
Result: ChainNotifier only processes blocks from the fully-synced chain tip onwards, eliminating unnecessary notification processing overhead during the initial header sync phase.
|
Just a disclaimer the LLM reasoning interpretation is not correct. For a correct reasoning interpretation can be found here #10280 (comment) and #10280 (comment) PS:
When we reach at BuildChainControl we are at 100,000ish header not 750,000. If the latter the wallet rescanning issue wouldn't be that much noticeable
It is ~900,000 wasted block processing not ~100,000
It is days not minutes under ideal conditions |
|
yeah I was more giving an example, if a node runner updated to the buggy software and had his neutrino node already header-synced to block 750k, but when starting up a new node obviously we totally ignore the wallet birthday and would sync from the beginning. Thank you for pointing that out. |





Change Description
The original regression caused by this changeset 16a8b623b.
Closes #10180.
Closes #9081.
Closes #8157 – Can't reproduce that issue since 18.3+ releases.
Resolves Discussion #10146.
(Can't Reproduce) Resolves Discussion #9094.
(Overlap) Resolves Discussion #10227.
(Perhaps Overlap) Resolves Discussion #9908.
Related Issues:
(Can't Reproduce) #9108.
(Perhaps Overlap) #10225.
Steps to Test
Steps for reviewers to follow to test the change.
Pull Request Checklist
Testing
Code Style and Documentation
[skip ci]in the commit message for small changes.📝 Please see our Contribution Guidelines for further guidance.